<html>
<head>
	<title>Tuning SQL Relay</title>
	<link rel="stylesheet" href="../css/styles.css">
</head>
<body>
<h1>Tuning SQL Relay</h1>

<ul>
  <li><a href="#localdbconnection">Connect To The Database Using a Local Connection</a></li>
  <li><a href="#unixsocket">Connect To SQL Relay Using a Unix Socket</a></li>
  <li><a href="#handoff">Use a Platform That Supports File Descriptor Passing</a></li>
  <li><a href="#threads">Use a Platform That Supports Threads</a></li>
  <li><a href="#systemparameters">Tweak Various System Parameters</a></li>
  <li><a href="#dontgetcolumninfo">Don't Get Unnecessary Column Info</a></li>
  <li><a href="#resultsetbuffersize">Use Result Set Buffer Size</a></li>
  <li><a href="#memoryusage">Memory Usage</a></li>
  <li><a href="#howmanyconnections">How Many Connections Should I Run?</a></li>
  <li><a href="#howmanycursors">How Many Cursors Should I Allocate?</a></li>
</ul>

<a name="localdbconnection""/><h2>Connect To The Database Using a Local Connection</h2>

<p>One of the most common mistakes that people make when running SQL Relay on
the same machine as the database is not using a "local connection" to the
database.  It's an easy mistake to make, but when remedied, it can result in a
substantial improvement in performance.</p>

<p>Most databases support "remote connections" (usually over an inet socket) and
"local connections" (usually over a unix socket or IPC).  If you're running SQL
Relay on a seperate machine from the database, you have to configure it to
connect to the database using a remote connection.  However, if you're running
SQL Relay on the same machine as the database, then you should configure it to
connect to the database using a local connection if possible.</p>

<h3>MySQL</h3>

<p>MySQL supports local connections over a unix socket.</p>

<p>By default, MySQL is usually configured to accept local connections.  To make
sure MySQL is configuerd to accept local connections, edit the MySQL
configuration file (usually /etc/my.cnf) and add a line like the following to
the <tt>[mysqld]</tt> section if it's not already there:</p>

<blockquote>
socket=/var/lib/mysql/mysql.sock
</blockquote>
<p>Now restart MySQL.</p>

<p>To configure SQL Relay to connect to MySQL using the local connection,
edit the configuration file and make sure that the connectstring for the
MySQL instance uses the socket parameter, as follows:</p>

<blockquote>
user=testuser;password=testpassword;db=testdb;socket=/var/lib/mysql/mysql.sock
</blockquote>
<p>as opposed to:</p>

<blockquote>
user=testuser;password=testpassword;db=testdb;host=localhost;port=5432
</blockquote>
<h3>PostgreSQL</h3>

<p>PostgreSQL supports local connections over a unix socket.</p>

<p>By default, PostgreSQL is configured to accept local connections.  I'm not
sure you can even turn it off.  By default, PostgreSQL listens on
/tmp/.s.PGSQL.5432.  The directory can be overridden by modifying the
PostgreSQL configuration file (usually /var/lib/pgsql/data/postgresql.conf).
The following parameters affect which file PostgreSQL listens on and that
file's properties.  By default these parameters are commented out.  To modify
the parameters, remove the leading # and set the parameter.  By default, the
parameters look like this:</p>

<blockquote>
  <pre>#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal

#port = 5432
</pre>

</blockquote>
<p>The socket file that PostgreSQL will listen on will be found in /tmp by
default, or in whatever directory is specified by unix_socket_directory.  The
file always start with .s.PGSQL. and end with whatever port is specified with
the port parameter.  Note that the port parameter also determines which inet
port PostgreSQL will listen on.  By default it will have 777 permissions and
be owned by whatever user and group the postmaster process is running as.  You
can override the permissions using the unix_socket_permissions parameter and
the group using the unix_socket_group parameter.</p>

<p>The postmaster process also takes several command line parameters.  The -k
parameter overrides the unix_socket_directory parameter and the -p parameter 
overrides the port parameter.</p>

<p>After you modify postgresql.conf, restart PostgreSQL.</p>

<p>To configure SQL Relay to connect to PostgreSQL using the local connection,
edit the configuration file and make sure that the connectstring for the
PostgreSQL instance uses the host and port parameters as follows:</p>

<blockquote>
user=testuser;password=testpassword;host=/tmp;port=5432;
</blockquote>
<p>as opposed to:</p>

<blockquote>
user=testuser;password=testpassword;host=localhost;port=5432;
</blockquote>
<p>Ordinarily, the host parameter tells SQL Relay what host to connect to, but
if it starts with a / then it indicates what directory to look for the socket
file in.  Likewise, the port parameter ordinarily tells SQL Relay what port to
connect to, but if host starts with a / then it indicates that SQL Relay should
look for a file in the directory specifid by the host parameter starting with
.s.PGSQL. and ending in the port number.</p>

<h3>Oracle</h3>

<p>Oracle supports local connections over IPC using shared memory.</p>

<p>By default, Oracle isn't usually set up to listen for IPC connections.
To set up Oracle to listen for IPC connections, edit
$ORACLE_HOME/network/admin/listener.ora and look for a section like this:</p>

<blockquote>
  <pre>LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = TCP)(HOST = localhost.localdomain)(PORT = 1521))
      )
    )
  )
</pre>

</blockquote>
<p>Add an additional ADDRESS section as follows:</p>

<blockquote>
  <pre>LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
        (ADDRESS = (PROTOCOL = TCP)(HOST = localhost.localdomain)(PORT = 1521))
      )
    )
  )
</pre>

</blockquote>
<p>Now Oracle's listener is configured to listen for both IPC and inet socket
connections.  Restart Oracle</p>

<p>SQL Relay looks up the ORACLE_SID in the
$ORACLE_HOME/network/admin/tnsnames.ora file and uses the configuration
parameters there to connect to the database.  By default, the connection to the
database is configured to use an inet socket.  To set up an ORACLE_SID to use
IPC instead, edit $ORACLE_HOME/network/admin/tnsnames.ora and look for a
section for the ORACLE_SID like this:</p>

<blockquote>
  <pre>ORA1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = localhost.localdomain)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ora1)
    )
  )
</pre>

</blockquote>
<p>Modify it as follows:</p>

<blockquote>
  <pre>ORA1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ora1)
    )
  )
</pre>

</blockquote>
<p>SQL Relay requires no specific modifications to connect to the database using
IPC instead of an inet socket.  Just configure the connect string to use an SID
that uses IPC to connect to the database.  For example:</p>

<blockquote>
user=testuser;password=testpassword;oracle_sid=ora1;
</blockquote>
<h3>SQLite</h3>

<p>SQLite is different from other databases.  SQL Relay's sqlite connection
uses the sqlite library to access the database file directly.  Effectively,
the database connection is always local.</p>

<h3>IBM DB2</h3>

<p>As far as I can tell, DB2 supports local connections over IPC using
shared memory, semaphores and message queues (though I can't find any
documentation that specifically says that).</p>

<p>By default, clients connect to DB2 databases using local connections.  In
fact, to access a remote instance, you have to set up a stub local instance
which relays to the remote instance and connect to the local instance.</p>

<p>SQL Relay requires no specific modifications to connect to the database.
Just configure the connect string to use the local database.  For example:</p>

<blockquote>
db=testdb
</blockquote>
<h3>Informix</h3>

<p>Informix databases support connections over a wide range of protocols,
including remote connections using TCP/IP sockets and local connections using
shared memory.</p>

<p>Databases are identified in the INFORMIXDIR/etc/sqlhosts file.  The second
parameter for each database entry identifies the protocol to use to connect to
it.  The onsoctcp protocol tells client programs to connect to the
database using a TCP/IP socket and the onipcshm protocol tells client
programs to connect to the database using shared memory.  For a local
connection, use the onipcshm protocol.</p>

<p>Once sqlhosts is configured, just configure SQL Relay to use that database.
For example:</p>

<blockquote>
db=testdb
</blockquote>
<h3>SAP/Sybase</h3>

<p>As far as I can tell, SAP/Sybase only supports client connections over inet
sockets.  If anyone knows differently, send mail to
<a href="mailto:dev@firstworks.com">dev@firstworks.com</a>.</p>

<h3>Firebird</h3>

<p>As far as I can tell, Firebird only supports client connections
over inet sockets.  If anyone knows differently, send mail to
<a href="mailto:dev@firstworks.com">dev@firstworks.com</a>.</p>

<a name="unixsocket""/><h2>Connect To SQL Relay Using a Unix Socket</h2>

<p>SQL Relay supports inet and unix sockets.  If you are runing the SQL Relay
client (your program) and the SQL Relay server on the same machine, make sure
that they are talking to each other over the unix socket.</p>

<p>First make sure that SQL Relay is configured to listen on a unix
socket.  In the configuration file, the instance that is talking to mysql
should have a socket attribute:</p>

<blockquote>
&lt;instance id="mysqltest" port="8006" socket="/tmp/mysqltest.socket" ... &gt;
</blockquote>
<p>In the above example, the SQL Relay server will listen on inet port 8006 and
unix socket /tmp/mysqltest.socket.</p>

<p>Now make sure that your app is using that socket as well.  When you create
a new sqlrconnection, make sure you tell it to use the unix socket:</p>

<blockquote>
sqlrconnection	*con=new sqlrconnection("localhost",8006,"/tmp/mysqltest.socket","user","password",0,1);
</blockquote>
<p>as opposed to:</p>

<blockquote>
sqlrconnection  *con=new sqlrconnection("localhost",8006,NULL,"user","password",0,1);
</blockquote>
<p>(for non-C++ api's the syntax is similar)</p>

<p>The client will first try the unix socket, then if it can't connect, it will
try the host/port.</p>

<p>Make sure to type the filename of the unix socket correctly.  I've seen
several cases where people were really confused why they were getting bad
performance even though they were using unix sockets.  They had just typed the
unix socket file name wrong and it was trying it, failing, then falling back to
host/port.</p>

<a name="handoff""/><h2>Use a Platform That Supports File Descriptor Passing</h2>

<p>When a client connects to the SQL Relay listener process, it waits for an available database connection, then either hands the client off to that process via file descriptor passing, or proxies the incoming client, ferrying data back and forth between it and the database connecton process.</p>

<p>Whether to pass or proxy can be specified by the "handoff" parameter in ths configuration file.  If you omit the handoff parameter or set handoff="pass" then file descriptor passing will be used.  If you set handoff="proxy" then proxying will be used.</p>

<p>File descriptor passing is faster and lighter than proxying, and is the default, but it isn't supported on all platforms.  For example, Linux kernels prior to 2.2 and the Cygwin environment under Windows, do not support file descriptor passing and it is likely that other platforms don't as well, especially older platforms.</p>

<p>If you want the best performance, you must use a platform that supports file descriptor passing.</p>

<a name="threads""/><h2>Use a Platform That Supports Threads</h2>

<p>When a client connects to the SQL Relay listener process, it forks off either a thread or child process and waits for an available database connection.  Support for forking a thread, rather than a process was added in version 0.54.</p>

<p>Whether to fork a thread or process can be specified by the "sessionhandler" parameter in ths configuration file.  If you omit this parameter or set sessionhandler="thread" then a thread will be used.  If you set sessionhandler="process" then a child process will be used.  Note that sessionhandler="thread" became the default in version 0.58.  Previously it defaulted to "process".</p>

<p>Forking a thread is faster and lighter than forking a child process, and is the default (as of version 0.58), but it isn't supported on all platforms, especially older platforms.  Also, if the Rudiments library that SQL Relay is built on was intentionally compiled without thread support then SQL Relay will also not be able to use threads.</p>

<p>If SQL Relay is configured to use threads but can't, then it falls back to forking a child process and prints a warning on the screen when the sqlr-listener process starts.</p>

<p>If you want the best performance, you must use a platform that supports threads.</p>

<a name="systemparameters""/><h2>Tweak Various System Parameters</h2>

<p>Certain system parameters can be tuned to get better performance out of SQL Relay.</p>

<h3>TIME_WAIT timeout</h3>

<p>The first paramater that comes to mind is the TIME_WAIT timeout.  When a TCP 
client disconnects from a server, the socket that the client was connected on
goes into a TIME_WAIT state for typically between a minute and 4 minutes.</p>

<p>For servers serving data over the the unreliable internet, this is probably 
reasonable.  For internal servers, dedicated to serving other internal servers 
on a reliable network, reducing the length of the timeout is probably OK.</p>

<p>Here's why it helps...</p>

<p>The kernel keeps a list of sockets in the TIME_WAIT state.  When the list
is full, failures start to occur.</p>

<p>If your server is getting new client connections faster than it can bleed off
sockets in the TIME_WAIT state, the list will ultimately get full.  Decreasing
the timeout increases the bleed-off rate.</p>

<p>The following instructions illustrate how to change the timeout rate for
Linux and Solaris.  Note that I got these instructions off of the web and have
not tried all of them myself.</p>

<p>For Linux, set the timeout by executing the following command.  In this
example, the timeout is set to 30 seconds.  You should put this command in a
system startup file so it will be executed at boot time.</p>

<blockquote>
  <pre>echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
</pre>

</blockquote>
<p>You might also try enabling tcp_tw_recycle and tcp_tw_reuse as follows:</p>

<blockquote>
  <pre>echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
</pre>

</blockquote>
<p>Those parameters enable fast recycling and reuse of sockets in the
TIME_WAIT state.  I can't find any documentation about them other than the
following though:</p>

<blockquote>
  <pre>tcp_tw_recycle - BOOLEAN
 Enable fast recycling TIME-WAIT sockets. Default value is 0.
 It should not be changed without advice/request of technical
 experts.

tcp_tw_reuse - BOOLEAN
 Allow to reuse TIME-WAIT sockets for new connections when it is
 safe from protocol viewpoint. Default value is 0.
 It should not be changed without advice/request of technical
 experts.
</pre>

</blockquote>
<p>For Solaris, the parameter can be modified while the system is running
using the ndd command to set the number of milliseconds to wait.  These
examples set the timeout to 30 seconds.  You should put these commands in a 
system startup file so they'll be executed at boot time.</p>

<p>For Solaris 7 and later:</p>

<blockquote>
  <pre>ndd -set /dev/tcp tcp_time_wait_interval 30000
</pre>

</blockquote>
<p>For Solaris 2.6 and earlier:</p>

<blockquote>
  <pre>ndd -set /dev/tcp tcp_close_wait_interval 30000
</pre>

</blockquote>
<h3>Port range</h3>

<p>Another paramter that you may want to tweak is the range of available ports.
On Linux, you can display the range by running:</p>

<blockquote>
  <pre>/sbin/sysctl net.ipv4.ip_local_port_range
</pre>

</blockquote>
<p>You can increase this to range from 1024 to 65535 by running the following 
command:</p>

<blockquote>
  <pre>/sbin/sysctl -w net.ipv4.ip_local_port_range="1024 65535"
</pre>

</blockquote>
<p>You should put this command in a system startup file so it'll be executed at
boot time.</p>

<p>I'm not sure what the default port range is or how to change it on other
operating systems.</p>

<a name="dontgetcolumninfo""/><h2>Don't Get Unnecessary Column Info</h2>

<p>If you run select queries but don't need to know the names/types/sizes of
the columns, rather you can just refer to them by number, then you can call
dontGetColumnInfo() before running your query and much less data will be
transmitted from the SQL Relay server to the client.  This is really useful for
speeding up 1 column queries like "select count(*) from mytable".</p>

<a name="resultsetbuffersize""/><h2>Use Result Set Buffer Size</h2>

<p>By default, SQL Relay fetches the entire result set of a query into memory.
SQL Relay was originally written for use with web-based apps, and most web
pages display small result sets so generally it's faster if the entire result
set is buffered.  But, if you run select queries that return really long result
sets, you should use setResultSetBufferSize() to fetch rows in groups instead
of all at once or 1 at a time.  The value to use tends to be query-specific
though so you usually have to do some benchmarking to get it optimized.  If you
fetch too many rows at once, memory has to be allocated which slows things down
(and in really bad cases can cause swapping) but if you fetch too few at once,
it causes too many round trips to the database.  Each query has a sweet spot,
finding it may dramatically improve that query's performance.</p>

<a name="memoryusage""/><h2>Memory Usage</h2>

<p>Many database API's allocate memory internally to store result sets.  However, Oracle, DB2, Informix, SAP/Sybase, FreeTDS, MySQL, Firebird and ODBC allow you to allocate buffers yourself and fetch result sets into these buffers.  When using these databases, SQL Relay preallocates a result set buffer and re-uses it over and over rather than dynamically allocating memory for each result set.  This improves performance at the cost of memory.  There are parameters for tuning the size of the result set buffer though.</p>

<p>The size of the result set buffer is determined by 3 parameters:</p>

<ul>
  <li>The maximum number of columns in the result set.</li>
  <ul>
    <li>Defaults to 256</li>
    <li>Setting to -1 causes buffers to be allocated dynamically</li>
  </ul>

  <li>The maximum size of an individual non-LOB field.</li>
  <ul>
    <li>Defaults to 32768, the maximum size of a varchar and varbinary types</li>
  </ul>

  <li>The number of rows of the result set to fetch at once.</li>
  <ul>
    <li>Defaults to 10</li>
  </ul>

</ul>

<p>The third parameter might be a little confusing.  Some databases only support single-row fetches; you run a query, then fetch 1 row at a time from the database.  Oracle, DB2 and SAP/Sybase support array fetches; you can run a query and then fetch any number of rows from the database in a single round-trip.  This can improve performance significantly.</p>

<p>The default result set buffer size for Oracle, DB2, Informix, and SAP/Sybase is 80mb (256 fields per row * 32768 bytes per field * 10 rows to fetch at once).  Informix, FreeTDS, MySQL, Firebird, and ODBC don't support multi-row fetches so its default buffer size is about 8mb (256*32768*1 bytes).</p>

<p>It is important to note that each cursor contains a result set buffer and each connection opens a configurable number of cursors.</p>

<p>For example, if you open 10 connections with 2 cursors each, then since each cursor will consume 80mb, each connection will consume about 160mb, and all 10 connections will consume about 1.6g.</p>

<p>If that seems like more memory than you'd like to allocate to SQL Relay then you can tune it down.</p>

<p>For the best performance/memory-usage trade-off, the "maximum number of columns in the result set" should be set to the largest number of columns that a query run by your apps could return.  However, this parameter can also be set to -1, causing column buffers to be allocated dynamically for each result set.  This won't perform as well but will conserve memory.</p>

<p>The "maximum size of an individual non-LOB field" parameter should be set to the longest non-LOB field that your app could fetch.</p>

<p>There is no rule of thumb for the "number of rows of the result set to fetch at once" parameter.  More tends to be better, but not always.  You may have to tune this by trial and error.</p>

<p>For Oracle, DB2, and SAP/Sybase, the maxselectlistsize, maxitembuffersize and fetchatonce parameters of the string attribute of the connection tag in the configuration file may be set to control these values at run time.  For Informix, MySQL, Firebird, and FreeTDS, the maxselectlistsize and maxitembuffersize parameters are available but the fetchatonce parameter is not.  When routing queries or sessions, the fetchatonce parameter is available.</p>

<p>See the <a href="configreference.html">SQL Relay Configuration Reference</a> for details.</p>

<p>Possibly more important than the parameters controlling the result set buffer size though, are the cursors and connections parameters as they are both multipliers.  See tips below for each.</p>

<a name="howmanyconnections""/><h2>How Many Connections Should I Run?</h2>

<p>A good rule of thumb is to run as many connections as your database can
handle.</p>

<p>The best way is to determine how many connections to run is to make an
educated guess, let your applications run, monitor the performance of the
SQL Relay server and database and adjust the number of connections
accordingly.</p>

<p>If you are implementing SQL Relay because your database had been 
getting overloaded with too many connections, you may have a pretty good idea of
how many connections your database can handle, and you can use that number as a
starting point.</p>

<p>When just letting your applications run isn't feasable, you can use a
load-testing program like apachebench or LoadRunner to do automated testing.
You'll have to set up scripts and pages to simulate the kind of load that your
applications will place on SQL Relay and the database, but you can get an
accurate idea of how your application will perform end-to-end.  You can let the
load-testing program run, monitor the performance of the SQL Relay server and
database and adjust the number of connections accordingly.</p>

<p>There is also a program in the test/stress directory of the SQL Relay
distribution called <i>querytest</i> that logs into the SQL Relay server,
runs a series of queries and logs out, over and over, as fast as possible.
You can run many instances of <i>querytest</i> simultaneously to simulate
the kind of load your applications might place on SQL Relay and the database.
<i>querytest</i> is OK as a starting point, but to do any serious testing, you
should modify the queries that it runs to more accurately simulate your
applications.</p>

<p>If you are concerned about memory usage, consider allowing connections to scale up and down as necessary by using the connections, maxconnections, maxqueuelength and growby parameters in the configuration file.  See the <a href="configreference.html">SQL Relay Configuration Reference</a> for detailed information on these parameters.</p>

<a name="howmanycursors""/><h2>How Many Cursors Should I Allocate?</h2>

<p>Usually not very many.  Most often, only one.  If your app requires lots of
cursors, then it's as likely as not that it isn't closing or freeing them
properly.</p>

<p>By default, unless setResultSetBufferSize() is called, SQL Relay clients
use a server-side cursor to fetch the entire result set into memory and then
release the cursor.  This way, only one server-side cursor is required,
independent of how many client-side cursors are used.</p>

<p>However, if you configure the client to fetch say, 10 rows at a time, and
run nested queries - eg. run a select, and then run an insert for each row of
the select, then two server side cursors would be required.  One for the outer
select and another for the inner inserts.  This is a legitimate use of more
than 1 cursor.</p>

<p>Sometimes though, an app will allocate 10 cursors, set each to fetch 5 rows
at a time, run a query using each cursor and then free all 10 cursors.  This
would require 10 server-side cursors.  Only one server-side cursor would be
required if: one cursor were allocated and reused, each cursor was closed when
the app was done with it, or each cursor was freed when the app was done with
it.  This is a common problem and, unfortunately, sometimes architectural
constraints make it unavoidable.</p>

<p>SQL Relay has parameters that allow cursors to scale up and down as needed.  If you are concerned about memory usage, consider using the cursors, maxcursors and cursors_growby parameters in the configuration file.  See the <a href="configreference.html">SQL Relay Configuration Reference</a> for detailed information on these parameters.
</p>

</body>
</html>
